##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000

Univariate

Quality Histogram

most of the wine are quality of 5 and 6 and it also show that we don’t
have enough data of all quality of wine

Sulphate histogram

the sulphates histogram has outliers it shows that there are some wine
with more sulphate values

Sugar Histogram

sugar value of wines lies between 1.5 and 2.5 it clearly states that most of
wine have less sugary

Fixed Acidity

acidity of wine is normally distributed and there are wine with high acidity
like 16

Volatile Acidity

we can see that volatile acidity is normaly distributed between .4 to .8

Citric Acid

we can crealy see that instially citric acid are left skewed and clearly show
that the distribution is widely spreded

Chlorides

we clearly see that it left skewed and it has some outliers, it has quratile
range from .07 to .09

Chlorides

it has quartile range are ranging from 7 to 21

Density

density has quartile range from .995 to .997 and has mean of .996, and has
max value 1.003

pH

we can see the quartie ranging from 3.2 to 3.4

Alcohol

we can see that it is left skewed and it has few outiers and it ranging
from 9.5 to 11.10

Univariate Analysis Discussion

What is the structure of your dataset?

we nearly having 1599 observation and of 12 variables but we don’t
have equaly distrbuted of datas of each quality

What is/are the main feature(s) of interest in your dataset?

i would like to see what are factor which actually defining the quality of
wine

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

residual sugar got few outlier so i subset the sugar less that 8 ad i ploted to
see the
distribution of sugar

Bivariate analysis

correlation of all properties

in the above plot we can clearly see that there is a correlation between is
good amount of correlation between

  • citric acid and acidity
  • density and acidity
  • free sulfur and total sulfur
  • chlorides and sulphates

citric acid vs fixed acidity

the above scatter plot clearly shows that citric acid causes the acidity the
outlier is also removed for the better understanding of data

Fixed Acidity and pH

the above scatter plot clearly shows that lesser ph value have more acidity
# quality vs alcohol

the bar plot shows that high quality wine have relatively high alcohol content
than lower quality wine

Acidity vs alcohol

the above clearly states that low quality wine has vinegar taste but when we
see quality 7 and 8 there is no significant change

quality vs chlorides

most of wine has same level of saltiness except 3 becuase it has a big quadrant

Free sulfur vs total sulfur

here we can see that if we have more sulphur which also leads to more free
sulphur but in this dataset all the point clouded in the lower point and ti
shows that most of wine are less sulfur

density vs alcohol

there is coorelation between density and alcohol hihger the density lower
the alcohol content

volatile Acidity and pH

acutally acidity leads to pH but interstingly volatile acidity not affecting
pH value of wine

alcohol and quality

this plot show that there is diffrent variant of wine less quality with
moderate alcohol moderate quality less alcohol and more
qulaity with more alcohol

Bivariate analysis discussion

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

one of the interesting realtion which i found in the dataset is density
corealating with qualit of wine

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

we know that acidity affects pH value but the intresting relation which i
found is volatile is not affecting the pH of wine

What was the strongest relationship you found?

  • citric acid and acidity
  • density and acidity
  • free sulfur and total sulfur
  • chlorides and sulphates

multivariate analysis

quality over acidity and pH

the above multivariate plot show that quality 5 wine having high acidity
when compare to 7

citrus over pH and acidity

we can see that there is a corelation between acidity and pH value .
i think citric acid is the reason for acidity because lower citric acid
has less citric acid and high ph value 0

quality over free sulfur and total sulfur

the streak of length of sulphur is less in lesser quality wine shows that
lower quality wine has less in sulfur

citrus over free sulfur dioxide and total sulfur di oxide

the above grid plot clearly shows that quality 5 and 6 has more sulphur
and also shows that it has less citric acid which leads to less acidiy

quality over fixed acidity and density

the wine quality 5 and 6 has high acidity and wide range of density but
3,4,7,8 has less acidity but 8 has less acidity and density

alcohol over acidity and density

the above plot show that lesser density has higher alcohol content
but quality 7 having lesser density abd highr alchol content

Multivariate Analysis Discussion

Were there any interesting or surprising interactions between features?

eventhough we have decent (more or less same) of amount of wine in both
quality 5 and 6 the alcohol content defininf the quality of wine

Final Plots and Summary

Plot One

Description One

the scatter plot clearly show that there is increase in citirc acid
leads to increase in fixed acidity, the main reason i chosed the plot
is i had a doubt on volatile acidity is not making diffrence in pH value
so before that i wanna see the relationship between citric acid and
fixed acid because the citric acid actually induce fixed acid

Plot Two

Description Two

we can see that the dot are wide spreded without any correalation and
it common that each quality of wine have wide range of volatile acid

Plot Three

Description Three

it clearly shows that high citrus level leads to high acidity and low pH
value likewise low citrus values gives low acidity and pH value and also
gives that quality wine is mainly depend on citrus level of wine

Discussion on final plot

volatile aciditu not affecting the pH of wine and volatile acid is not
the factor affectng the quality of wine, the fixed acidity is one
of the main factor decides the qulaity of wine
fixed acidity is caused because of citric acid so we citric acid regulates
the qulity of wine

Reflection

i started the anaysis by seeing the summary of the data
in that i have found that there are few variable with more diffrence between
quartile range and max value, it clearly depicts that it has outliers then
i did univariate anaysis in that i found that qulaity of alcohol
is not equally distributed so it clearly states that we don’t have enough
data in each quality,it a big drawback so we can’t make any decision on
each quality of data, later in bivariate analysis i made a correlation
plot to see the relation between variable then i found the factor that
mainly infucing the quality of wine, the factor are citrus and acidity
then while ploting volatile acidity and pH i didn’t see much correlation
this bought me question that why volatile acidity not influencing the
pH of the wine then i proceeded with multivariate analysis
in that i made citrusCut as a new variable and explored the realation
between fixed acidity,pH and citrus then in final plot we concluded that
volatile acidity is not affecting the pH of wine

Future Analysis

i though of finding a dataset where quality of wine are equally distributed
and a recent dataset to continue my future exploratory analysis